High resolution decision tree based acoustic modeling beyond CART

نویسندگان

  • Wu Chou
  • Wolfgang Reichl
چکیده

In this paper, an m-level optimal subtree based phonetic decision tree clustering algorithm is described. Unlike prior approaches, the m-level optimal subtree in the proposed approach is to generate log likelihood estimates using multiple mixture Gaussians for phonetic decision tree based state tying. It provides a more accurate model of the log likelihood variations in node splitting and it is consistent with the acoustic space partition introduced by the set of phonetic questions applied during the decision tree state tying process. In order to reduce the algorithmic complexity, a caching scheme based on previous search results is also described. It leads to a significant speed up of the m-level optimal subtree construction without degradation of the recognition performance, making the proposed approach suitable for large vocabulary speech recognition tasks. Experimental results on a standard (Wall Street Journal) speech recognition task indicate that the proposed m-level optimal subtree approach outperforms the conventional approach of using single mixture Gaussians in phonetic decision tree based state tying.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Forest Stand Types Classification Using Tree-Based Algorithms and SPOT-HRG Data

Forest types mapping, is one of the most necessary elements in the forest management and silviculture treatments. Traditional methods such as field surveys are almost time-consuming and cost-intensive. Improvements in remote sensing data sources and classification –estimation methods are preparing new opportunities for obtaining more accurate forest biophysical attributes maps. This research co...

متن کامل

Text-to-speech inspired duration modeling for improved whole-word acoustic models

In the construction of whole-word acoustic models, we have previously demonstrated substantial gains by using MAP estimation to introduce a simple prior model of phonetic timing. Based solely on the word’s phonetic (dictionary) pronunciation, this simple model included no information about the individual durations of constituent phones. However, the problem of modeling segmental duration has lo...

متن کامل

High accuracy acoustic modeling based on multi-stage decision tree

In many continuous speech recognition systems based on HMMs, decision tree-based state tying has been used for not only improving the robustness and accuracy of context dependent acoustic modeling but also synthesizing unseen models. To construct the phonetic decision tree, standard method has used just single Gaussian triphone models to cluster states. The coarse clusters generated using just ...

متن کامل

Early Detection of Douglas-Fir Beetle Infestation with Subcanopy Resolution Hyperspectral Imagery

Early detection of insect or pathogen infestations in forests would be useful to forest managers who want to make decisions that minimize timber losses. Typical methods of forest reconnaissance to detect infestations have included analysis of multispectral imagery. Multispectral imagery, however, often lacks the sensitivity to detect subtle changes in tree canopy reflectance because of physiolo...

متن کامل

Data-driven perceptually based join costs

Concatenative speech synthesis systems attempt to minimize audible discontinuities between two successive concatenated units. In unit selection concatenative synthesis, a join cost is calculated that is intended to predict the extent of audible discontinuity introduced by the concatenation of two specific units. A study was conducted that used human perceptual data on the detectability of mid-v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998